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Amendments to the Claims 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 

Listing of Claims: 



1 1 . (currently amended): A system for grouping clusters of 

2 semantically scored documents electronically stored in a data corpus, comprising: 

3 a scoring module determining a score, which is assigned to at least one 

4 concept that has been extracted from a plurality of electronically-stored 

5 documents, wherein the score is based on at least one of a frequency of 

6 occurrence of the at least one concept within at least one such document, a 

7 concept weight, a structural weight, and a corpus weight; 

8 a clustering module forming clusters of the documents by evaluating the 

9 score for the at least one concept of each document for a best fit to the clusters 

10 and assigning each document to the cluster with the best fit; and 

1 1 a threshold module determining similarities between the documents 

12 grouped into each cluster based on the center of the cluster and the scores 

13 assigned to each of the at least one concepts in each such document, dynamically 



14 determining a threshold for each cluster bas e d on as a function of the similarities 

15 b e tw ee n th e docum e nts group e d into th e clust e r and a c e nt e r of th e clust e r , and 

1 6 identifying and reassigning those documents having the_similarities falling outside 

17 the threshold. 



1 2. (original): A system according to Claim 1, further comprising: 

2 the scoring module calculating the score as a function of a summation of 

3 at least one of the frequency of occurrence, the concept weight, the structural 

4 weight, and the corpus weight of the at least one concept. 

1 3. (original): A system according to Claim 2, further comprising: 
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2 a compression module compressing the score through logarithmic 

3 compression. 

1 4. (original): A system according to Claim 1 , further comprising: 

2 the scoring module calculating the concept weight as a function of a 

3 number of terms comprising the at least one concept. 

1 5. (original): A system according to Claim 1 , further comprising: 

2 the scoring module calculating the structural weight as a function of a 

3 location of the at least one concept within the at least one such document. 

1 6. (original): A system according to Claim 1, further comprising: 

2 the scoring module calculating the corpus weight as a function of a 

3 reference count of the at least one concept over the plurality of documents. 

1 7. (currently amended): A system according to Claim 1 , further 

2 comprising: 

3 the scoring module forming the score assigned to the at least one concept 



4 to a normalized score vector for each such document, determining [[a]] each such 

5 similarity between the normalized score vector for each such document as an 

6 inner product of each normalized score vector, and applying the similarity to the 

7 best fit criterion. 



1 8. (original): A system according to Claim 1, further comprising: 

2 the clustering module evaluating a set of candidate seed documents 

3 selected from the plurality of documents, identifying a set of seed documents by 

4 applying the score for the at least one concept to a best fit criterion for each such 

5 candidate seed document, and basing the best fit criterion on the score of each 

6 such seed document. 

1 9. (currently amended): A method for grouping clusters of 

2 semantically scored documents electronically stored in a data corpus, comprising: 
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3 determining a score, which is assigned to at least one concept that has 

4 been extracted from a plurality of electronically-stored documents, wherein the 

5 score is based on at least one of a frequency of occurrence of the at least one 

6 concept within at least one such document, a concept weight, a structural weight, 

7 and a corpus weight; 

8 forming logically-grouped clusters of the documents by evaluating the 

9 score for the at least one concept of each document for a best fit to the clusters 

10 and assigning each document to the cluster with the best fit; 

11 determining similarities between the documents grouped into each cluster 

12 based on the center of the cluster and the scores assigned to each of the at least 

13 one concepts in each such document; 

14 dynamically determining a threshold for each cluster bas e d on as a 

1 5 function of the similarities b e tw ee n th e docum e nts group e d into th e clust e r and a 

16 center of th e clust e r ; and 

1 7 identifying and reassigning those documents having the similarities falling 

1 8 outside the threshold. 

1 1 0. (original): A method according to Claim 9, further comprising: 

2 calculating the score as a function of a summation of at least one of the 

3 frequency of occurrence, the concept weight, the structural weight, and the corpus 

4 weight of the at least one concept. 

1 11. (original): A method according to Claim 1 0, further comprising: 

2 compressing the score through logarithmic compression. 

1 12. (original): A method according to Claim 9, further comprising: 

2 calculating the concept weight as a function of a number of terms 

3 comprising the at least one concept. 

1 13. (original): A method according to Claim 9, further comprising: 

2 calculating the structural weight as a function of a location of the at least 

3 one concept within the at least one such document. 
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1 14. (original): A method according to Claim 9, further comprising: 

2 calculating the corpus weight as a function of a reference count of the at 

3 least one concept over the plurality of documents. 

1 15. (currently amended): A method according to Claim 9, further 

2 comprising: 

3 forming the score assigned to the at least one concept to a normalized 

4 score vector for each such document; 

5 determining [[a]] each such similarity between the normalized score 

6 vector for each such document as an inner product of each normalized score 

7 vector; and 

8 applying the similarity to the best fit criterion. 

1 16. (original): A method according to Claim 9, further comprising: 

2 evaluating a set of candidate seed documents selected from the plurality of 

3 documents; 

4 identifying a set of seed documents by applying the score for the at least 

5 one concept to a best fit criterion for each such candidate seed document; and 

6 basing the best fit criterion on the score of each such seed document. 

1 17. (currently amended): A computer-readable storage medium 

2 holding code for grouping clusters of semantically scored documents 

3 electronically stored in a data corpus, comprising: 

4 code for determining a score, which is assigned to at least one concept that 

5 has been extracted from a plurality of electronically-stored documents, wherein 

6 the score is based on at least one of a frequency of occurrence of the at least one 

7 concept within at least one such document, a concept weight, a structural weight, 

8 and a corpus weight; 

9 code for forming logically-grouped clusters of the documents by 

10 evaluating the score for the at least one concept of each document for a best fit to 

1 1 the clusters and assigning each document to the cluster with the best fit; 
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12 code for determining similarities between the documents grouped into 

13 each cluster based on the center of the cluster and the scores assigned to each of 

14 the at least one concepts in each such document; 

1 5 code for dynamically determining a threshold for each cluster bas e d on as 

1 6 a function of the similarities b e tw ee n th e docum e nts group e d into th e clust e r and 

17 a c e nt e r of th e clust e r ; and 

1 8 code for identifying and reassigning those documents having the 

1 9 similarities falling outside the threshold. 

1 18. (currently amended): A system for providing efficient document 

2 scoring of concepts within and clustering of documents in an electronically-stored 

3 document set, comprising: 

4 a scoring module scoring a document in an electronically-stored document 

5 set, comprising: 

6 a frequency module determining a frequency of occurrence of at 

7 least one concept within a document; 

8 a concept weight module analyzing a concept weight reflecting a 

9 specificity of meaning for the at least one concept within the document; 

10 a structural weight module analyzing a structural weight reflecting 

1 1 a degree of significance based on structural location within the document for the 

12 at least one concept; 

13 a corpus weight module analyzing a corpus weight inversely 

14 weighing a reference count of occurrences for the at least one concept within the 

15 document; and 

16 a scoring evaluation module evaluating a score to be associated 

17 with the at least one concept as a function of the frequency, concept weight, 

18 structural weight, and corpus weight; and 

19 a clustering module grouping the documents by score into a plurality of 

20 clusters, comprising: 

21 a cluster seed module identifying candidate seed documents, which 

22 are each assigned as a seed document into a cluster with a center most similar to 
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23 the seed document, and assigning each non-seed document to the cluster with the 

24 best fit; and 

25 a threshold module relocating outlier documents, comprising 

26 determining similarities between the documents grouped into each cluster based 

27 on the center of the cluster and the scores assigned to each of the at least one 

28 concepts in each such document, dynamically determining a threshold for each 

29 cluster bas e d on as a ftinction of the similarities b e tw ee n th e docum e nts group e d 

30 into th e clust e r and a c e nt e r of th e clust e r , and identifying and reassigning the 

3 1 documents with the_similarities falling outside the threshold. 

1 19. (previously presented): A system according to Claim 18, further 

2 comprising: 

3 the scoring module evaluating the score in accordance with the formula: 

j 

5 where comprises the score, fy comprises the frequency, 0 < cwy < 1 comprises 

6 the concept weight, 0 < swy < 1 comprises the structural weight, and 0 < rwy < 1 

7 comprises the corpus weight for occurrence j of concept /. 

1 20. (previously presented): A system according to Claim 19, further 

2 comprising: 

3 the concept weight module evaluating the concept weight in accordance 

4 with the formula: 

'0.25 + (0.25x/..) lZt g Z3 

5 cw ¥ =^0.25 + (0.25x[7-^])l 4<t y <6 

0.25, r,>7 

6 where cwy comprises the concept weight and ty comprises a number of terms for 

7 occurrence j of each such concept /. 

1 21 . (previously presented): A system according to Claim 1 9, further 

2 comprising: 
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the structural weight module evaluating the structural weight in 
accordance with the formula: 
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where ^w ; y comprises the structural weight for occurrence j of each such concept /. 

22. (previously presented): A system according to Claim 19, further 
comprising: 

the corpus weight module evaluating the corpus weight in accordance with 
the formula: 



where rwy comprises the corpus weight, r,y comprises a reference count for 
occurrence j of each such concept /, T comprises a total number of reference 
counts of documents in the document set, and M comprises a maximum reference 
count of documents in the document set. 

23. (previously presented): A system according to Claim 19, further 
comprising: 

a compression module compressing the score in accordance with the 
formula: 



where SJ comprises the compressed score for each such concept /. 

24. (original): A system according to Claim 18, further comprising: 
a global stop concept vector cache maintaining concepts and terms; and 
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3 a filtering module filtering selection of the at least one concept based on 

4 the concepts and terms maintained in the global stop concept vector cache. 

1 25. (original): A system according to Claim 18, further comprising: 

2 a parsing module identifying terms within at least one document in the 

3 document set, and combining the identified terms into one or more of the 

4 concepts. 

1 26. (original): A system according to Claim 25, further comprising: 

2 the parsing module structuring each such identified term in the one or 

3 more concepts into canonical concepts comprising at least one of word root, 

4 character case, and word ordering. 

1 27. (original): A system according to Claim 25, wherein at least one of 

2 nouns, proper nouns and adjectives are included as terms. 

1 28. (original): A system according to Claim 18, further comprising: 

2 a plurality of candidate seed documents; 

3 a similarity module determining a similarity between each pair of a 

4 candidate seed document and a cluster center; 

5 a clustering module designating each such candidate seed document 



6 separated from substantially all cluster centers with such similarity being 

7 sufficiently distinct as a seed document, and grouping each such candidate seed 

8 document not being sufficiently distinct into a cluster with a nearest cluster 

9 center. 



1 29. (original): A system according to Claim 28, further comprising: 

2 a plurality of non-seed documents; 

3 the similarity module determining the similarity between each non-seed 

4 document and each cluster center; and 

5 the clustering module grouping each such non-seed document into a 

6 cluster having a best fit, subject to a minimum fit criterion. 
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1 30. (original): A system according to Claim 29, further comprising: 

2 a normalized score vector for each document comprising the score 

3 associated with the at least one concept for each such concept occurring within 

4 the document; and 

5 the similarity module determining the similarity as a function of the 

6 normalized score vector associated with the at least one concept for each such 

7 document. 

1 31. (previously presented): A system according to Claim 30, further 

2 comprising: 

3 the similarity module calculating the similarity in accordance with the 

4 formula: 

5 coso-^ = x ,_ ... ' 

6 where coscr^ comprises a similarity between a document A and a document J5, 

7 S A comprises a score vector for document A, and S B comprises a score vector for 

8 document B. 

1 Claims 32-34 (canceled). 

1 35. (currently amended): A method for providing efficient document 

2 scoring of concepts within and clustering of documents in an electronically-stored 

3 document set, comprising: 

4 scoring a document in an electronically-stored document set, comprising: 

5 determining a frequency of occurrence of at least one concept 

6 within a document; 

7 analyzing a concept weight reflecting a specificity of meaning for 

8 the at least one concept within the document; 

9 analyzing a structural weight reflecting a degree of significance 
10 based on structural location within the document for the at least one concept; 
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1 1 analyzing a corpus weight inversely weighing a reference count of 

12 occurrences for the at least one concept within the document; and 

13 evaluating a score to be associated with the at least one concept as 

14 a function of the frequency, concept weight, structural weight, and corpus weight; 

15 and 

16 grouping the documents by score into a plurality of clusters, comprising: 

17 identifying candidate seed documents, which are each assigned as 

1 8 a seed document into a cluster with a center most similar to the seed document; 

1 9 assigning each non-seed document to the cluster with the best fit; 

20 relocating outlier documents, comprising: 

21 determining similarities between the documents grouped into each 

22 cluster based on the center of the cluster and the scores assigned to each of the at 

23 least one concepts in each such document; 

24 dynamically determining a threshold for each cluster bas e d on as a 

25 function of the similarities b e tw ee n th e docum e nts group e d into th e clust e r and a 

26 c e nt e r of th e clust e r ; and 

27 identifying and reassigning the documents with the similarities 

28 falling outside the threshold. 

1 36. (previously presented): A method according to Claim 35, further 

2 comprising: 

3 evaluating the score in accordance with the formula: 

J 

4 S, = Yif 0 xcw ¥ xsw 9 xrw y 

5 where 5, comprises the score, fg comprises the frequency, 0 < cwy < 1 comprises 

6 the concept weight, 0 < swy < 1 comprises the structural weight, and 0 < rwy < 1 

7 comprises the corpus weight for occurrence j of concept /. 

1 37. (previously presented): A method according to Claim 36, further 

2 comprising: 

3 evaluating the concept weight in accordance with the formula: 
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0.25 + (0.25x/ /y )t 
nv,=^0.25 + (0.25x[7-/J 
0.25, 




where cwy comprises the concept weight and ty comprises a number of terms for 
occurrence j of each such concept /. 

38. (previously presented): A method according to Claim 36, further 
comprising: 



evaluating the structural weight in accordance with the formula: 
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where swy comprises the structural weight for occurrence j of each such concept i. 

39. (previously presented): A method according to Claim 36, further 
comprising: 

evaluating the corpus weight in accordance with the formula: 



where rwy comprises the corpus weight, r,y comprises a reference count for 
occurrence j of each such concept i, T comprises a total number of reference 
counts of documents in the document set, and M comprises a maximum reference 
count of documents in the document set. 

40. (previously presented): A method according to Claim 36, further 
comprising: 

compressing the score in accordance with the formula: 




r tj >M 
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4 s;=io g (s <+ i) 

5 where S' g comprises the compressed score for each such concept /. 

1 41 . (original): A method according to Claim 35, further comprising: 

2 maintaining concepts and terms in a global stop concept vector cache; and 

3 filtering selection of the at least one concept based on the concepts and 

4 terms maintained in the global stop concept vector cache. 

1 42. (original): A method according to Claim 35, further comprising: 

2 identifying terms within at least one document in the document set; and 

3 combining the identified terms into one or more of the concepts. 

1 43. (original): A method according to Claim 42, further comprising: 

2 structuring each such identified term in the one or more concepts into 

3 canonical concepts comprising at least one of word root, character case, and word 

4 ordering. 

1 44. (original): A method according to Claim 42, further comprising: 

2 including as terms at least one of nouns, proper nouns and adjectives. 

1 Claim 45 (canceled). 

1 46. (previously presented): A method according to Claim 35, further 

2 comprising: 

3 identifying a plurality of non-seed documents; 

4 determining the similarity between each non-seed document and each 

5 cluster center; and 

6 grouping each such non-seed document into a cluster with a best fit, 

7 subject to a minimum fit criterion. 

1 47. (original): A method according to Claim 46, further comprising: 
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2 forming a normalized score vector for each document comprising the 

3 score associated with the at least one concept for each such concept occurring 

4 within the document; and 

5 determining the similarity as a function of the normalized score vector 

6 associated with the at least one concept for each such document. 

1 48. (previously presented); A method according to Claim 47, further 

2 comprising: 

3 calculating the similarity in accordance with the formula: 



coso*^ = • 



5 where cos <j ab comprises a similarity between a document A and a document B, 

6 S A comprises a score vector for documents, and S B comprises a score vector for 

7 document B. 

1 Claims 49-51 (canceled). 

1 52. (currently amended): A computer-readable storage medium 

2 holding code for providing efficient document scoring of concepts within and 

3 clustering of documents in an electronically-stored document set, comprising: 

4 code for scoring a document in an electronically-stored document set, 

5 comprising: 

6 code for determining a frequency of occurrence of at least one 

7 concept within a document; 

8 code for analyzing a concept weight reflecting a specificity of 

9 meaning for the at least one concept within the document; 

10 code for analyzing a structural weight reflecting a degree of 

1 1 significance based on structural location within the document for the at least one 

12 concept; 

13 code for analyzing a corpus weight inversely weighing a reference 

14 count of occurrences for the at least one concept within the document; and 
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1 5 code for evaluating a score to be associated with the at least one 

16 concept as a function of the frequency, concept weight, structural weight, and 

17 corpus weight; and 

1 8 code for grouping the documents by score into a plurality of clusters, 

19 comprising: 

20 code for identifying candidate seed documents, which are each 

21 assigned as a seed document into a cluster with a center most similar to the seed 

22 document; 

23 code for assigning each non-seed document to the cluster with the 

24 best fit; 

25 code for relocating outlier documents, comprising: 

26 code for determining similarities between the documents grouped 

27 into each cluster based on the center of the cluster and the scores assigned to each 

28 of the at least one concepts in each such document; 

29 code for dynamically determining a threshold for each cluster 

30 bas e d on as a function of the similarities b e tw ee n the docum e nts group e d into th e 

3 1 clust e r and a cent e r of th e cluster ; and 

32 code for identifying and reassigning the documents with the 

33 similarities falling outside the threshold . 

1 53. (currently amended): An apparatus for providing efficient 

2 document scoring of concepts within and clustering of documents in an 

3 electronically-stored document set, comprising: 

4 means for scoring a document in an electronically-stored document set, 

5 comprising: 

6 means for determining a frequency of occurrence of at least one 

7 concept within a document; 

8 means for analyzing a concept weight reflecting a specificity of 

9 meaning for the at least one concept within the document; 



Final OA Resp 



- 15- 



Response to Final Office Action 
Docket No. 013.0207.US.UTL 



10 means for analyzing a structural weight reflecting a degree of 

1 1 significance based on structural location within the document for the at least one 

12 concept; 

1 3 means for analyzing a corpus weight inversely weighing a 

14 reference count of occurrences for the at least one concept within the document; 

15 and 

16 means for evaluating a score to be associated with the at least one 

17 concept as a function of the frequency, concept weight, structural weight, and 

18 corpus weight; and 

19 means for grouping the documents by score into a plurality of clusters, 

20 comprising: 

21 means for identifying candidate seed documents, which are each 

22 assigned as a seed document into a cluster with a center most similar to the seed 

23 document; 

24 means for assigning each non-seed document to the cluster with 

25 the best fit; 

26 means for relocating outlier documents, comprising: 

27 means for determining similarities between the documents grouped 

28 into each cluster based on the center of the cluster and the scores assigned to each 

29 of the at least one concepts in each such document; 

30 means for dynamically determining a threshold for each cluster 

3 1 bas e d on as a function of the similarities b e tw ee n th e docum e nts group e d into th e 

32 clust e r and a c e nt e r of th e clust e r ; and 

33 means for identifying and reassigning the documents with the 

34 similarities falling outside the threshold. 
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